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Abstract 

In many eukaryotes, physically linked gene pairs tend to be coexpressed. However, it is still controversial to what extent this 
neighbor coexpression is maintained by selection and to what extent it is nonselective, purely mechanistic "leaky expression." 
Here, we analyze expression patterns of gene pairs that have lost their linkage in the evolution of Saccharomyces cerevisiae 
since its last common ancestor with Kluyveromyces waltii or that were never linked in the S. cerevisiae lineage but became 
neighbors in a related yeast. We demonstrate that coexpression of many linked genes is retained long after their separation 
and is thus likely to be functionally important. In addition, unlinked gene pairs that recently became neighbors in other yeast 
species tend to be coexpressed in S. cerevisiae. This suggests that natural selection often favors chromosomal 
rearrangements in which coexpressed genes become neighbors. Contrary to previous suggestions, selectively favorable 
coexpression appears not to be restricted to bidirectional promoters. 
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Introduction 

A gene's expression pattern is influenced by its genomic lo- 
cation, both in prokaryotes and eukaryotes. In prokaryotes, 
neighboring genes often form operons, resulting in tight co- 
expression of neighboring genes. In eukaryotes, physically 
linked gene pairs also show higher coexpression than ran- 
domly chosen gene pairs (Cohen et al. 2000; Kruglyak 
and Tang 2000; Lercher et al. 2002, 2003; Williams and 
Hurst 2002; Hurst et al. 2004; Singer et al. 2005; Lercher 
and Hurst 2006; Semon and Duret 2006; Batada et al. 
2007; Kensche et al. 2008). For example, in yeast, adjacent 
gene pairs show correlated expression regardless of their rel- 
ative orientation (Cohen et al. 2000; Kruglyak and Tang 
2000), and this coexpression relationship spans up to 30 
neighboring genes (Lercher and Hurst 2006). In the worm 
Caenorhabditis elegans, many coexpressed genes are orga- 
nized into operons (Lercher et al. 2003). In the mouse ge- 
nome, both immune system genes and tissue-specific genes 
are found to be expressed in clusters (Williams and Hurst 
2002). In the human genome, housekeeping genes also 
show strong clustering (Lercher et al. 2002). Based on their 
apparent evolutionary conservation, it has been proposed 
that such coexpression clusters are selectively favorable in 



mammals (Singer et al. 2005). However, a later report found 
that highly coexpressed gene pairs are more likely to be bro- 
ken up by rearrangements, concluding that neighbor coex- 
pression is in fact generally disadvantageous in mammals 
(Liao and Zhang 2008). 

The coexpression of neighboring genes in prokaryotic op- 
erons is conceptually simple. In eukaryotes, a range of 
mechanisms has been proposed to be responsible for the 
coexpression of closely spaced genes. Coexpressed neigh- 
boring genes in divergently transcribed orientation suggest 
that bidirectionally active promoters play a role in regulating 
coexpression (Cohen et al. 2000; Kruglyak and Tang 2000; 
Kensche et al. 2008), although such "bipromoters" may also 
serve to reduce stochastic gene expression noise (Wang 
et al. 201 1). Chromatin structure also likely has an impact 
on the coexpression of closely located genes (Hurst et al. 
2004; Batada et al. 2007; Chen et al. 2010). Finally, gene 
pairs that share the same transcription factors or that 
may be prone to a failure of transcription termination ("tran- 
scriptional read-through") were also reported to be respon- 
sible for coexpression of neighboring genes (Semon and 
Duret 2006; Batada et al. 2007; Michalak 2008). Neighbor- 
ing genes with similar functions have lead to the proposal 
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that the coexpression of linked genes may be related to 
gene function (Cohen et al. 2000; Michalak 2008). 

Thus, neighboring eukaryotic genes tend to be coex- 
pressed. But is this coexpression really selectively favorable, 
or is it a nonselective, purely mechanistic by-product of ge- 
nomic neighborhood, as suggested at least for mammals 
(Liao and Zhang 2008)? If neighbor coexpression is indeed 
functional, then coexpression should be maintained even if 
the neighborhood is broken up by genomic rearrangements. 
Here, we compare the effects of current and ancestral gene 
order on current gene expression patterns in the yeast Sac- 
charomyces cerevisiae. In particular, we show that gene 
pairs that were genomic neighbors in the evolutionary past, 
but are separated now, show higher coexpression than ran- 
domly chosen gene pairs. As nonselective neighbor coexpres- 
sion should seize after breaking up the neighborhood, our 
results indicate a significant role of natural selection in the 
coexpression of linked yeast genes. 

Materials and Methods 

Data Sources 

The 5. cerevisiae gene order as well as the ancestral gene 
order were taken from the yeast gene order database (Byrne 
and Wolfe 2005). We only retained genes with known po- 
sitions in both data sets for further analysis. 

Saccharomyces cerevisiae genome sequence data were 
downloaded via ftp from the Saccharomyces Genome 
Database (ftp://genome-ftp.stanford.edu/pub/yeast/). An- 
cestral gene order from eight reconstructed chromosomes 
was obtained from Gordon et al. (2009). Information on 
gain and loss of neighborhood along the yeast phylogeny 
was taken from Kensche et al. (2008). 

The divergent gene pairs that share a promoter (bipro- 
moter pairs) were taken from genome-wide tiling array ex- 
periments (Xu et al. 2009). 

Expression Data 

We employed the same expression data used by Batada, Ur- 
rutia, and Hurst to assess the influence of chromatin remod- 
eling on the coexpression of neighboring yeast genes 
(Batada et al. 2007). Briefly, coexpression was averaged 
across 23 large-scale time course messenger RNA (mRNA) 
expression data sets, each covering at least 1 0 different time 
points. For each data set, Pearson's product moment corre- 
lation coefficient was calculated between the expression 
vectors of any two genes; for a given gene pair, coexpression 
was then defined as the mean value across data sets (for 
details, see supplementary information 1 of Batada et al. 
2007). 

Some of these data sets were obtained using cDNA mi- 
croarrays, which may have spotted chromosomal neighbors 
onto neighboring microarray spots. It is hence possible that 



coexpression of genes neighboring in the current 5. cerevi- 
siae genome was overestimated due to experimental arti- 
facts (Lercher and Hurst 2006). To address this issue, we 
repeated part of our analyses using coexpression derived 
from a set of 1 ,370 Affymetrix microarray experiments from 
the National Center for Biotechnology Information Gene Ex- 
pression Omnibus (GEO) database (GEO accession IDs are 
listed in supplementary table S1, Supplementary Material 
online). We renormalized the log2-transformed expression 
values across all microarrays using the "aroma. light" package 
in BioConductor (Gentleman et al. 2004). Pairwise mRNA co- 
expression between two genes was again calculated as Pear- 
son's product moment correlation coefficient across 
experiments. Results based solely on Affymetrix microar- 
rays were qualitatively very similar to those presented in 
the main text (see supplementary results S1, Supplemen- 
tary Material online). 

Tandemly duplicated genes can lead to overestimation of 
the coexpression of neighboring genes. To avoid biases 
caused by tandem duplications, we removed all such pairs 
from our analyses. Tandem duplicates were identified as 
neighbors in the 5. cerevisiae genome with Blast e value 
<0.01 (Batada et al. 2007). 

Evolutionary Conservation of Bipromoter Gene 
Pairs 

To test if bipromoter gene pairs are more conserved than other 
divergent gene pairs, we employed the reconstructed ances- 
tral gene order published by the Wolfe lab (http://wolfe.- 
gen.tcd.ie/ygob/). Only genes annotated in both the 
S. cerevisiae genome and the ancestor genome were used. 
Divergent gene pairs in the S. cerevisiae genome were marked 
as "ancestral" if they were direct chromosomal neighbors in 
the reconstructed ancestor and as "new" otherwise. 

A simple logistic regression model was utilized to deter- 
mine if bipromoter gene pairs are still more conserved than 
non-bipromoter pairs after controlling for coexpression level 
and intergenic distance, which is the strongest known pre- 
dictor for linkage breakup (Poyatos and Hurst 2007). We 
used the following model: 

z~orient + coexpr + igd, z~orient + coexpr + igd, 
with 

z: 1 = ancestral pair, 0 = new pair; 
orient: 0 = bipromoter divergent pair, 1 = non- 
bipromoter divergent pair; 
coexpr: coexpression level; 
igd: intergenic distance. 

Intergenic distance was measured as the distance in base 
pairs between the transcription start sites of genes. Calcu- 
lations were performed in the R environment for statistical 
computing. The model shows that after controlling for co- 
expression and intergenic distance, bipromoter status still 
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Table 1 

Only Divergent Gene Pairs Show Higher Coexpression in Ancient Compared with New Neighbors (P Values from Brunner-Munzel Tests) 







Neighbors in Ancestor and 
Saccharomyces cerevisiae 


Neighbi 


3rs in S. cerevisiae only 


P 3 


r* 


Mean 


N 


Distance (bp) 


Mean 


N 


Distance (bp) 


Divergent (< >) 


0.14 


738 


372 


0.11 


502 


682 


0.00064 


0.010 


Convergent (-> «-) 


0.11 


708 


3,558 


0.11 


561 


3,709 


0.95 


0.61 


Cooriented (-» — >) 


0.084 


1,140 


2,085 


0.084 


1,090 


2,226 


0.85 


0.78 



a P value from Brunner-Munzel test. 

b P value from a logistic regression model to test if coexpression still has an impact on gene pair conservation after controlling for intergenic distance (details analogous to the 
model described for bipromoter pairs in Materials and Methods). 



has a highly significant effect on gene pair conservation (P = 
2.9 x 1CT 7 ); the effect of orientation was indeed much 
stronger than the influence of the other two variables (data 
not shown). 

Dollo Parsimony Method to Calculate Ancestral 
States 

Phylogenetic relationships of 1 9 yeasts and the neighboring 
gene pairs of orthologs in these fungi were downloaded 
from the supplementary file of Kensche et al. (2008). We 
used the Dollo parsimony method implemented in PAUP* 
(Wilgenbusch and Swofford 2003) to calculate the ancestral 
neighborhood state of each gene pair. This algorithm pro- 
vided us with gain and loss information of gene neighbor- 
hood relationships at each internal node. 

Correlation between Age of Separation and 
Coexpression 

To test if coexpression is higher for ancestrally linked gene 
pairs that were separated more recently, we also used the 
information on ancestral gene order and separation ages 
derived from Kensche et al. (2008) as described above. 
We did not find a significant correlation between separation 
age and coexpression level (Pearson's product moment cor- 
relation coefficient: R = 0.087, P = 0.14; Spearman's rank 
correlation coefficient: p = -0.044, P = 0.46). 

Neighborhood Conservation Predicts 
Coexpression only for Divergent Gene 
Pairs 

We used the recently reconstructed gene order of the pre- 
whole-genome duplication yeast ancestor (Gordon et al. 
2009), which is believed to be about 1 00-1 50 My old (Sugino 
and Innan 2005). We first compared the coexpression of 
gene pairs that are conserved between the ancestor and 
S. cerevisiae with the coexpression of gene pairs newly 
formed in 5. cerevisiae. Here, coexpression of two genes 
is defined as the correlation of gene expression values across 
a large data set of time series experiments (Kafri et al. 2005; 



Batada et al. 2007). To avoid potential biases caused by tan- 
demly duplicated genes, such gene pairs were removed 
prior to all analyses. 

Three possible scenarios exist: 1) If the conserved gene 
pairs are less likely to be coexpressed compared with newly 
formed gene pairs, then highly coexpressed neighboring 
gene pairs may be generally disadvantageous, as was ob- 
served recently in mammals (Liao and Zhang 2008). 2) If 
conserved gene pairs share similar coexpression profiles with 
newly formed gene pairs, then neighbor coexpression is 
likely to be largely selectively neutral. 3) If the conserved 
gene pairs generally show higher coexpression levels com- 
pared with newly formed gene pairs, then this indicates that 
neighbor coexpression is generally advantageous, as previ- 
ously suggested (Singer et al. 2005). 

Table 1 shows the results of this comparison. For diver- 
gently oriented S. cerevisiae gene pairs (< >), those that 

were already in this orientation in the ancestral genome 
show higher coexpression compared with newly formed di- 
vergent gene pairs. No such difference between conserved 
and new pairs was found for convergent or cooriented gene 
pairs. This indicates that in yeast, only divergent gene pairs 
are under selection for high coexpression. Surprisingly, there 
is no difference between the coexpression of newly formed 
divergent gene pairs and convergent gene pairs (P = 0.59 
comparing new divergent gene pairs with conserved conver- 
gent gene pairs and P = 0.59 comparing new divergent 
gene pairs with newly formed convergent gene pairs, Brun- 
ner-Munzel tests). Thus, divergent gene pairs do not always 
show higher coexpression compared with other types of ad- 
jacent gene pairs in yeast. These results are not a consequence 
of variation in intergenic distance, which is known to be the 
strongest predictor of gene neighborhood conservation in 
yeast (Poyatos and Hurst 2007): the effect of neighborhood 
conservation status remains qualitatively unchanged when 
using both conservation status and intergenic distances as 
predictors in a logistic regression model (table 1). 

The observed difference between ancestral and young di- 
vergent gene pairs is likely related to the activity of bidirec- 
tionally active promoters (bipromoters) (Kruglyak and Tang 
2000). Gene pairs newly formed by rearrangements will 
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rarely be controlled by bipromoters. We identified gene pairs 
regulated by bipromoters based on published data (Xu et al. 
2009). We hypothesized that strong coexpression of diver- 
gent gene pairs is found predominantly for bipromoter pairs. 
Consistent with this prediction, we found that coexpression 
of bipromoter divergent pairs is significantly stronger than 
for non-bipromoter divergent pairs (0.260 ± 0.009 vs. 
0.1 80 ±0.01 04; P = 6.0 x 10~ 9 , Brunner-Munzel test). Fur- 
thermore, we found that 469 out of 660 bipromoter gene 
pairs (71.1 %) were already present in the ancestral genome, 
whereas the same is true for only 48.9% of the non-bipro- 
moter divergent gene pairs. This result is again not an arti- 
fact of intergenic distance (Poyatos and Hurst 2007) or of 
the coexpression level of these genes (P = 2.9 x 10~ 7 in 
a logistic regression model, see Materials and Methods 
for details). More importantly, there is no difference be- 
tween the coexpression level of conserved bipromoter gene 
pairs and new bipromoter gene pairs (0.261 ± 0.011 vs. 
0.256 ± 0.178; P = 0.94, Brunner-Munzel test). 

These results have two important implications. On one 
hand, they suggest that coexpression per se cannot explain 
the conservation of bipromoter structures. On the other 
hand, the results indicate that there is a selective advantage 
for the retention of bipromoter structures. Besides coexpres- 
sion, another conserved function of bipromoters could be to 
reduce transcriptional noise (Wang et al. 201 1). 

Gene Pairs that Used to be Neighbors 
Are Still Coexpressed 

We next analyzed the 2,765 ancestrally neighboring gene 
pairs that are located on different chromosomes in the cur- 
rent S. cerevisiae genome. On average, these separated pairs 
are significantly more coexpressed compared with 10,000 
randomly chosen gene pairs (P = 5.3 x 1 0~ 6 , Wilcoxon rank 
sum test). Except for shared as-regulatory sites, none of the 
proposed mechanistic reasons for neighbor coexpression ap- 
pear capable of explaining the persistence of coexpression 
after separation. Thus, coexpression is likely selectively favor- 
able for these ancestrally linked gene pairs and was thus kept 
or restored after their separation. 

We do not find significant differences between the coex- 
pression of gene pairs that were ancestrally linked in differ- 
ent relative orientations (P = 0.084 between divergent and 
convergent pairs, P = 0.13 between divergent and coor- 
iented gene pairs, and P = 0.70 between cooriented and 
convergent pairs; Brunner-Munzel tests). Of the known 
factors that specifically affect the coexpression of genomic 
neighbors, only chromatin remodeling acts independently 
of gene orientation. That we find no differences between 
orientations thus appears consistent with the idea that 
ancestral coexpression of these pairs was caused by local 
chromatin remodeling and is now maintained by shared 
c;s-regulatory sequences; these c/s-regulatory sequences 
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Co-expression between gene pairs 

Fig. 1. — Gene pairs located on different Saccharomyces cerevisiae 
chromosomes but neighboring in the ancestral genome (red) or in 
another yeast lineage (green) show slightly higher coexpression than 
random gene pairs (black), although coexpression is lower than in gene 
pairs that are neighbors in the current S. cerevisiae genome (blue). 
Average coexpression is significantly higher than random expectations 
for all three types of neighbors (P < 0.0002 in each comparison). 

may affect transcription factor binding as well as local 
chromatin remodeling. This maintenance further suggests 
that low-level coexpression caused by chromatin remodeling 
may in many cases be selectively favorable. 

The results presented above suggest that at least part of 
the high coexpression level of neighboring yeast gene pairs 
is due to natural selection on coexpression. Thus, genes still 
need to be coexpressed when pairs are separated through 
a genomic rearrangement. To further verify this hypothesis, 
we used recent data based on gene pair conservation across 
19 fungi (Kensche et al. 2008). We reconstructed the gene 
order in the common ancestor of these species using Dollo 
parsimony as implemented in PAUP* (Wilgenbusch and 
Swofford 2003). 

We only analyzed genes that were direct neighbors in the 
ancestral genome but that are now located on different 
chromosomes because genes located nearby on a yeast 
chromosome still show similar expression profiles even 
when separated by tens of genes (Lercher and Hurst 2006). 

As already observed in our first data set, separated gene 
pairs show slightly higher coexpression compared with ran- 
dom gene pairs (fig. 1 ; P = 0.00020, Brunner-Munzel test). 
Again, there is no difference between the coexpression of 
divergent, convergent, and cooriented ancestral gene pairs 
after their separation (P = 0. 1 8 between divergent and con- 
vergent pairs, P = 0.32 between divergent and cooriented 
gene pairs, and P = 0.76 between cooriented and conver- 
gent pairs; Brunner-Munzel tests). 

If gene neighborhood is under positive selection for 
genes that need to be coexpressed, then we would further 
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expect that orthologs of non-neighboring coexpressed S. 
cerevisiae genes are more likely to become genomic neigh- 
bors in other yeast lineages; consequently, genes neighbor- 
ing in at least one other yeast species but located on 
different chromosomes in both S. cerevisiae and in the com- 
mon ancestor should show higher coexpression than ran- 
dom gene pairs. This is indeed the case (fig. 1; P = 3.3 x 
1CT 5 , Brunner-Munzel test). 

Discussion 

Using ancestral gene order information gained from the 
yeast gene order browser (Byrne and Wolfe 2005), we con- 
firmed that among neighboring gene pairs, divergently ori- 
ented pairs are the ones that were most likely to be 
conserved during genome evolution (Kensche et al. 
2008). More specifically, this conservation is mostly due 
to bipromoter gene pairs. The conservation implicates sta- 
bilizing selection on the relative positioning of this subset of 
the divergently arranged gene pairs. 

After separation of neighboring gene pairs through geno- 
mic rearrangements, we no longer found any difference be- 
tween divergent and convergent orcooriented gene pairs; all 
three types of ancestrally neighboring gene pairs show higher 
than expected coexpression in S. cerevisiae after their sepa- 
ration through genomic rearrangements. It is possible that 
the two genes in these coexpressed separated pairs had part 
of their c;s-regulatory apparatus in common even before their 
separation, so that coexpression could be partially maintained 
after the rearrangement; conversely, it may be that coexpres- 
sion was initially lost in the rearrangement and was reinstated 
through os-regulatory changes afterward. 

The coexpression of ancestrally neighboring gene pairs 
that are now located on different chromosomes is sharply re- 
duced compared with the pairs that are neighbors in the cur- 
rent yeast genome (fig. 1). This observation is expected, as 
factors such as chromatin remodeling are known to strongly 
influence the coexpression of linked genes in yeast (Batada 
etal. 2007). Thus, although part of the neighbor coexpression 
is likely maintained by natural selection, it is likely that a sub- 
stantial component of neighbor coexpression is nonselective 
"leaky" expression of one or both neighbors. 

Could it be that all coexpression is in fact nonselective 
(purely mechanistic), and separated pairs only show coex- 
pression because part of the mechanistic apparatus shared 
between the two genes is maintained through the separa- 
tion? In particular, many of the linkage losses may be a con- 
sequence of the whole-genome duplication experienced in 
the S. cerevisiae lineage after the common ancestor of the 
yeasts analyzed here. Assume that the common ancestor 
contained a neighboring gene pair A,B together with 
a o's-regulatory region c that affects both genes (c-A-B). 
The whole-genome duplication will duplicate the complete 
set, resulting in C1-A1-B1 and C2-A2-B2. If subsequently A 1 



and B2 (or A2 and 87) are lost, the now separated genes A 
and B retain their identical os-regulatory region c. 

However, it is unlikely that such a scenario explains our 
observations, for at least two reasons. First, if selection 
would play no role in the maintenance of coexpression, then 
coexpression should fade with increasing age of the sepa- 
ration. This is not the case (Spearman's p = -0.044, 
P = 0.46; for details, see Materials and Methods). Second, 
and most importantly, a nonselective, purely mechanistic 
model cannot explain why unlinked gene pairs that only 
recently became neighbors in other yeast species are coex- 
pressed in S. cerevisiae. That these pairs show coexpression 
very similar to ancestrally linked pairs (fig. 1) seems only 
compatible with the hypothesis that natural selection pro- 
motes chromosomal rearrangements that bring together 
coexpressed genes. 

When discussing the properties of neighboring gene 
pairs, these are usually classified by their relative orientation 
into three categories — divergent gene pairs (head to head), 
convergent gene pairs (tail to tail), and cooriented gene 
pairs. Those three types of gene pairs appear to have differ- 
ent properties — divergent gene pairs are the most con- 
served and show stronger coexpression than the other 
two orientations (Kensche et al. 2008). Here we show that 
as far as coexpression is concerned, there are essentially only 
two types of neighboring gene pairs in the genome — bipro- 
moter gene pairs and non-bipromoter gene pairs. Bipro- 
moter gene pairs show strong signals of conservation and 
coexpression, whereas non-bipromoter gene pairs do not. 
After separation through genomic rearrangements, ances- 
tral divergent gene pairs no longer exhibit higher coexpres- 
sion compared with other ancestral gene pairs, supporting 
the view that chromatin remodeling dominates the coex- 
pression of most neighboring gene pairs (Batada et al. 
2007). 

In conclusion, we have shown that not only gene neigh- 
borhood in the current 5. cerevisiae genome but also gene 
order in the ancestral genome and gene order in related 
yeasts are predictive of coexpression. These results support 
a role for natural selection in the establishment and main- 
tenance of neighbor coexpression in yeast and argues 
against a purely mechanistic view that considers neighbor 
coexpression as a neutral (or even slightly deleterious) 
phenomenon. 

Supplementary Material 

Supplementary results, table S1, and figure S1 are available 
at Genome Bioiogy and Evolution online (http://www. 
gbe.oxfordjournals.org/). 
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